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ABSTRACT 

This  paper  introduces  a  new  framework  for  learning  multiscale  spa¬ 
rse  representations  of  natural  images  with  overcomplete  dictionar¬ 
ies.  Our  work  extends  the  K-SYD  algorithm  [1],  which  learns  spa¬ 
rse  single- scale  dictionaries  for  natural  images.  Recent  work  has 
shown  that  the  K-SVD  can  lead  to  state-of-the-art  image  restoration 
results  [2,  3].  We  show  that  these  are  further  improved  with  a  multi¬ 
scale  approach,  based  on  a  Quadtree  decomposition.  Our  framework 
provides  an  alternative  to  multiscale  pre-defined  dictionaries  such  as 
wavelets,  curvelets,  and  contourlets,  with  dictionaries  optimized  for 
the  data  and  application  instead  of  pre-modelled  ones. 

Index  Terms —  Image  Restoration,  Denoising,  Multiscale,  Sparsity 

1.  INTRODUCTION 

Consider  a  signal  x  E  !n.  We  say  that  it  admits  a  sparse  approxima¬ 
tion  over  a  dictionary  D  E  RnXk ,  composed  of  k  elements  referred 
to  as  atoms,  if  one  can  find  a  linear  combination  of  a  “few”  atoms 
from  D  that  is  “close”  to  the  signal  x.  The  so-called  Sparseland 
model  suggests  that  such  dictionaries  exist  for  various  classes  of  sig¬ 
nals,  and  that  the  sparsity  of  a  signal  decomposition  is  a  powerful 
model  in  many  image  processing  applications  [1,  2,  3]. 

Another  important  assumption,  commonly  and  successfully 
used  in  image  processing,  is  the  existence  of  multiscale  features 
in  images.  Trying  to  design  the  best  multiscale  dictionary  which 
fulfils  a  sparsity  criterion  has  been  a  major  challenge.  Such  at¬ 
tempts  include  the  wavelets,  curvelets,  contourlets,  wedgelets,  ban- 
diets,  and  steerable  wavelets  (see  for  example  [4]  and  references 
therein).  These  methods  lead  to  many  effective  algorithms  in  im¬ 
age  processing,  e.g.,  image  denoising  [5]. 

In  [1]  the  K-SVD  is  proposed  for  learning  a  single-scale  dic¬ 
tionary  for  sparse  representation  of  image  patches.  By  means  of  a 
sparsity  prior  on  all  fixed- sized  overlapping  patches  in  the  image, 
the  K-SVD  is  used  for  removing  white  Gaussian  noise,  leading  to 
a  highly  efficient  algorithm  [2].  This  has  been  recently  extended  to 
color  images,  with  state-of-the-art  results  in  denoising,  inpainting, 
and  demosaicing  applications  [3].  In  this  paper,  we  extend  the  ba¬ 
sic  K-SVD  work,  providing  a  framework  for  learning  multiscale  and 
sparse  representation  of  images.  In  addition  to  the  presentation  of 
the  new  framework,  we  apply  it  to  denoising,  obtaining  results  that 
outperform  reference  works  such  as  [2,  5,  6]  and  competes  favorably 
with  the  most  recent  and  state-of-the-art  in  this  field  [7]. 

The  task  of  learning  a  multiscale  dictionary  has  been  addressed 
in  [8]  in  the  general  context  of  sparsifying  image  content.  Our  ap¬ 
proach  differs  from  this  work  in  many  ways,  including:  (i)  their 
training  algorithm  employs  a  simple  steepest  descent  while  ours  uses 
more  effective  iterations,  thus  leading  to  faster  convergence;  (ii)  the 
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structure  of  the  multiscale  process;  and  (iii)  the  way  the  found  dic¬ 
tionaries  are  deployed  for  denoising  is  entirely  different,  as  we  base 
our  algorithm  on  the  energy  minimization  method  introduced  in  [2]. 
This  explains  the  superior  performance  we  obtain. 

2.  THE  SINGLE-SCALE  K-SVD  DENOISING  ALGORITHM 

In  this  section,  we  briefly  review  the  main  ideas  of  the  K-SVD  frame¬ 
work  for  sparse  image  representation  and  denoising.  The  reader  is 
referred  to  [1,  2,  3]  for  more  details. 

Let  xo  be  a  clean  image  and  y  =  xo  +  w  its  noisy  version 
with  w  being  an  additive  zero-mean  white  Gaussian  noise  with  a 
known  standard  deviation  a.  The  algorithm  aims  at  finding  a  sparse 
approximation  of  every  y/n  x  yfn  overlapping  patch  of  y,  where  n  is 
fixed  a-priori.  This  representation  is  done  over  an  adapted  dictionary 
D,  learned  for  this  set  of  patches.  These  approximations  of  patches 
are  averaged  to  obtain  the  reconstruct  image.  This  algorithm  (shown 
in  Figure  1)  can  be  described  as  the  minimization  of  an  energy: 

{oy,D,x}  =  arg  min  A||x-y|||  (1) 

D,a^j,x 

+  5ZMy||«y||o  +  53l|DQ!y  -Ryxlll . 

i,j  ij 

In  this  equation,  x  is  the  estimator  of  xo,  and  the  dictionary  D  E 
RnXk  is  an  estimator  of  the  optimal  dictionary  which  leads  to  the 
sparsest  representation  of  the  patches  in  the  recovered  image.  The 
indices  [z,  j)  mark  the  location  of  the  patch  in  the  image  (represent¬ 
ing  it’s  top-left  corner).  The  vectors  oiij  E  Rk  are  the  sparse  rep¬ 
resentations  for  the  [i,j]- th  patch  in  x  using  the  dictionary  D.  The 
notation  1 1 . 1 1 0  is  the  quasi-norm,  a  sparsity  measure,  which  counts 
the  number  of  non-zero  elements  in  a  vector.  The  operator  is 
a  binary  matrix  which  extracts  the  square  yfn  x  yfn  patch  of  coor¬ 
dinates  [i,j]  from  the  image  written  as  a  column  vector.  The  main 
steps  of  the  algorithm  are  (refer  to  Figure  1): 

Sparse  Coding  Step:  This  is  performed  with  an  Orthogonal  Match¬ 
ing  Pursuit  (OMP)  [9],  which  proves  to  be  very  efficient  for  diverse 
approximation  problems  [10].  The  approximation  stops  when  the 
residual  reaches  a  sphere  of  radius  y/nCcr  representing  the  proba¬ 
bility  distribution  of  the  noise.  More  on  this  is  found  in  [3]. 
Dictionary  Update:  This  is  a  sequence  of  one-rank  approximation 
problems  that  update  both  the  dictionary  atom  and  the  sparse  repre¬ 
sentations  that  use  it. 

Reconstruction:  The  last  step  is  a  simple  averaging  between  the 
patches  approximations  and  the  noisy  image.  The  denoised  image 
is  x.  Equation  (4)  emerges  directly  from  the  energy  minimization  in 
Equation  (2). 

Since  it  is  well  accepted  that  image  information  spreads  across 
multiple  scales,  designing  a  K-SVD  type  of  algorithm  that  is  able  to 
adapt  and  capture  information  at  multiple  scales  is  the  goal  of  this 
paper. 


Parameters:  A  (Lagrange  multiplier);  C  (noise  gain);  J  (number 
of  iterations);  k  (number  of  atoms);  n  (size  of  the  patches). 
Initialization:  Set  x  =  y;  Initialize  D  =  (c 6  E  MnXl)zei.../c 
(e.g.,  redundant  DCT). 

Loop:  Repeat  J  times 

•  Sparse  Coding:  Fix  D  and  use  OMP  to  compute  coeffi¬ 
cients  a.ij  E  Mlx/c  for  each  patch  by  solving: 

\/ij  ctij  —  argmin  ||a||o  subject  to  (2) 

a 

||RijX  —  Da||2  <  n(Ca)2. 

•  Dictionary  Update:  Fix  all  dij,  and  for  each  atom 
l  E  1,  2, . . . ,  k  in  D, 

-  Select  the  set  of  patches  which  use  this  atom, 

vi  =  {[i,j]\&ij(l)  /  0}. 

-  For  each  patch  [i,j]  E  oji,  compute  its  residual, 

e\j  =  R ijZ  -  itaij  +  d iQLij{l). 

-  Set  E i  as  the  matrix  whose  columns  are  the  e-,  and 
a1  the  row  vector  whose  elements  are  the  dij{l). 

-  Update  di  and  the  dij{l)  by  minimizing: 

(dz,dz)  =  argmin  ||E i  -  da\\2F.  (3) 

«,l|d||2  =  l 

This  one-rank  approximation  is  performed  by  a  trun¬ 
cated  SVD  ofE*. 

Reconstruction:  Perform  a  weighted  average: 

*»  (AI  +  ^R^R,)  (Ay +  y^R5Ddij).  (4) 

ij  ij 


Fig.  1.  The  single-scale  K-SVD-based  image  denoising  algorithm. 


Fig.  2.  Quadtree  model  chosen  for  the  multiscale. 


3.  THE  MULTISCALE  SPARSE  REPRESENTATION 

One  simple  and  naive  strategy  to  introduce  multiscale  analysis  con¬ 
sists  of  using  big  patches  with  a  high  redundancy  factor  (^),  and 
hope  for  the  appearance  of  intrinsic  multiple  scales  among  the 
learned  dictionary’s  atoms.  However,  we  have  observed  no  sig¬ 
nificant  differences  between  the  results  with  the  parameters  {n  = 
8  x  8,  k  =  256}  compared  to  {n  =  16  x  16,  k  —  1024}.  A  num¬ 
ber  of  reasons  might  explain  the  “failure”  of  this  direct  approach. 
First,  it  might  be  that  for  low  dimensions  (small  n)  there  is  no  need 
for  multiscale  structure  for  representation  and  denoising,  becoming 
more  crucial  as  the  dimension  grows.  In  that  respect,  16  x  16  blocks 
might  not  be  enough  for  the  original  K-SYD  algorithm  to  show  the 


multiscale  structure.  Another  explanation  is  that  it  may  be  that  the 
K-SVD  is  trapped  in  a  local  minima.  By  explicitly  imposing  such 
multiscale  structure,  we  may  help  in  this  regard.  This  leads  us  nat¬ 
urally  to  the  proposed  framework.  We  note  that  learning  multiscale 
dictionaries  is  important  per  se,  also  for  applications  beyond  image 
denoising. 

3.1.  The  basic  model 

In  this  paper  we  focus  on  the  use  of  different  sizes  of  atoms  si¬ 
multaneously.1  Considering  the  design  of  a  patch-based  representa- 
tion/denoising  framework,  we  put  forward  a  simple  Quadtree  model 
on  large  patches,  Figure  2.  This  is  a  classical  data  structure,  also 
used  in  wedgelets  for  example  [11].  A  fixed  number  of  scales,  N, 
is  chosen  that  corresponds  to  N  different  sizes  of  atoms.  A  big 
patch  of  size  n  pixels  is  divided  along  the  tree  to  sub-patches  of 
sizes  ns  =  where  s  is  the  depth  in  the  tree.  Then,  one  different 
dictionary  Ds  composed  of  ks  atoms  of  size  ns  is  built  at  each  scale. 
The  original  K-SVD  exploits  the  overlapping/shift-invariant  sparsity 
of  the  patches’  representation,  which  has  been  found  to  be  promi¬ 
nent  for  denoising  [2,  3,  12].  One  asset  of  our  multiscale  model  is 
that  it  does  not  allow  for  all  possible  shifts  for  the  sub-patches  inside 
one  large  patch,  preventing  them  from  constantly  adapting  their  po¬ 
sition  to  the  noisy  patch.  Therefore,  this  structure  permits  to  force 
and  exploit  the  overlapping/shift-invariance  sparsity  at  each  scale. 

The  overall  idea  of  the  multiscale  algorithm  we  propose  stays  as 
close  as  possible  to  the  original  K-SVD  algorithm,  Figure  1,  with  an 
attempt  to  exploit  the  several  existing  scales.  The  following  are  the 
key  modifications  to  the  basic  algorithm: 

Sparse  Coding:  This  remains  unchanged  if  we  introduce  some  new 
notations.  In  Equation  (3)  assume  that  R ^  remains  the  matrix  that 
extracts  the  patch  of  size  no  =  n  with  coordinates  ij.  The  dictionary 
D  is  a  joint  one,  composed  of  all  the  atoms  of  all  the  dictionaries 
Ds  =  ( dsl  e  Rnxl)iei..  ,ks  located  at  every  possible  position  in 
the  Quadtree.  For  the  scale  s,  there  exists  4s  such  positions,  we 
denote  their  index  as  p.  This  makes  a  total  of  J2f=oX  4s  ks  atoms 
in  D.  The  OMP  is  implemented  efficiently  using  a  Modified  Gram- 
Schmidt  algorithm  [13].  For  each  patch,  this  step  can  be  achieved  in 
MnH&Ho)  operations. 

Dictionary  Update:  This  step  is  slightly  changed,  as  we  update  each 
atom  d si  (1  <  l  <  ks)  in  each  scale  (from  s  =  N  —  1  downwards), 
by: 

•  Select  the  set  of  sub-patches  from  the  scale  s  that  use  the  l- th 
atom,  Usi  =  {[i,j,8,p]\aij(s,l,p)  ^  0},  where  [i,j,s,p] 
denotes  the  sub-patch  at  the  scale  s  and  position  p  from  the 
patch  ij ,  and  aij(s,l,p)  is  the  coefficient  corresponding  to 
the  atom  dsi . 

•  For  each  sub-patch  [z,  j,  s,p]  E  ujsu  compute 

&ijsp  TSp  (R ij-x.  -  D dij)  +  d Siaij(s,l,p), 

where  Tsp  E  (0,l}nsXn°  is  a  binary  matrix  which  extracts 
the  sub-patch  [z,  j,  s,  p\  from  a  patch  [z,  j]. 

•  Set  E si  as  the  matrix  whose  columns  are  the  e-Jsp,  and  asl 
the  row  vector  whose  elements  are  the  dij  (s,l,p). 

•  Update  dsi  and  the  dij  (s,l,p)  using  a  SVD  as  before: 

(d si,dsl)  =  argmin  ||E st  -da\\2F. 

«>lldll2  =  l 

'in  a  separate  work  we  also  consider  using  a  multiscale  pyramid  and 
learning  dictionaries  at  all  the  pyramid  scales  (see  also  [8]).  Results  along 
this  direction  will  be  reported  elsewhere. 


Reconstruction:  Remains  the  same  as  in  Equation  (4),  while  using 
the  new  notation  just  introduced.  Note  that  each  patch  is  recon¬ 
structed  from  multiple-scales,  and  since  a  pixel  belongs  to  multiple 
(overlapping)  patches,  it  is  reconstructed  with  multiple  scales  and  at 
multiple  positions. 

The  computational  time  of  the  Sparse  Coding  is  paramount  com¬ 
pared  to  the  Dictionary  Update  and  the  Reconstruction  stages.  The 
total  complexity  is  therefore  ^  ks)nLJM)  where  L  is  the 

average  sparsity  factor  (number  of  coefficients  obtained  in  the  de¬ 
composition),  and  M  is  the  number  of  patches  processed. 

3.2.  Additional  Algorithmic  Improvements 

Compared  to  the  original  K-SYD  algorithm  [2],  we  introduce  some 
additional  refinements,  which  further  improve  the  result  without  in¬ 
creasing  the  computational  cost. 

First,  we  find  it  useful  to  force  the  presence  of  a  constant  (DC) 
atom  in  each  dictionary,  and  to  give  it  a  preference  by  multiplying 
this  atom  by  a  constant  (2.5  in  our  examples)  during  the  selection 
procedure  of  the  OMP  (refer  to  [9]).  This  makes  sense  since  a  con¬ 
stant  atom  does  not  introduce  any  noise  in  a  reconstruction. 

Secondly,  as  discussed  in  [3],  the  stopping  criterion  during  the 
OMP  is  based  on  the  norm  of  an  n-dimensional  Gaussian  vector 
which  is  distributed  by  the  generalized  Rayleigh  law.  This  means 
that  one  has  to  stop  the  approximation  when  the  residual  reaches  a 
fuzzy  sphere.  But  according  to  this  law,  the  bigger  n  is,  the  thinner 
the  sphere  is,  and  the  more  accurate  the  stopping  criterion 
y/ln)C(n)cj  becomes  (C  is  a  parameter  that  depends  on  n).  Thus 
one  asset  of  increasing  n  through  our  multiscale  scheme  is  to  pro¬ 
vide  an  improved  stopping  criterion.  It  is  actually  not  necessary  to 
perform  a  complete  multiscale  algorithm  to  take  advantage  of  this 
property.  During  the  Sparse  Coding  stage,  instead  of  processing 
each  patch  separately,  one  can  choose  to  process  some  adjacent  sets 
of  non-overlapping  patches  simultaneously  and  consider  them  as  a 
larger  patch  (and  therefore  associated  with  a  better  stopping  crite¬ 
rion).  In  practice,  we  choose  m  adjacent  patches  of  size  n,  and  we 
first  process  them  independently  using  their  own  stopping  criterion 

n)C(n)a .  Then,  as  long  as  the  cumulative  error  of  the  m  patches 
is  larger  than  the  (better)  stopping  criterion  ^/{nm)C(nm) a,  we  re¬ 
fine  the  approximation  by  progressively  adding  terms,  one  at  a  time, 
to  the  sparse  expansion  of  the  worse  of  the  m  patches.  Then  we  con¬ 
sider  a  new  set  of  m  patches  and  continue  the  sparse  approximation. 
This  does  not  increase  the  complexity  of  the  algorithm  and  provides 
noticeable  improvements. 

4.  EXPERIMENTAL  RESULTS 

We  now  present  denoising  results  obtained  within  the  proposed  mul¬ 
tiscale  sparsity  framework.  On  Table  1,  our  results  for  N  =  1 
(single- scale)  and  N  =  2  scales  are  compared  to  those  presented 
in  [2,  5,  6,  7].  The  best  results  are  shared  between  our  algorithm  and 
[7],  where  [7]  performs  better  only  for  very  high  noise  (beyond  the 
normal  expected  one)  and  on  the  images  “barbara”  and  “lena.”  For 
N  =  1,  n  =  8  x  8.  For  N  =  2,  n  is  10  x  10  for  cr  =  5,  12  x  12 
for  CT  =  10,  16  x  16  for  15  <  cr  <  25,  and  20  x  20  for  cr  >  50. 
The  results  from  our  experiments  and  [2,  7]  reported  in  Table  1  are 
averaged  over  5  experiments  for  each  image  and  each  level  of  noise. 
During  our  experiments,  the  number  of  iterations  J  was  fixed  to  20, 
the  number  of  atoms  ks  for  each  scale  was  set  to  256  and  the  pa¬ 
rameter  A  to  0.45n2/cr.  The  parameter  m ,  representing  the  num¬ 
ber  of  patches  simultaneously  processed,  and  C,  are  reported  within 
the  table.  The  initial  dictionaries  used  during  these  experiments  are 


the  results  of  off-line  training  on  a  large  generic  database  of  images 
[2,  3].  The  so-called  sparsity  factor  L  for  these  off-line  training  was 
set  to  L  =  6  for  N  =  1,  L  =  20  for  N  =  2,  and  L  =  30  for 
N  =  3.  Some  visual  results  for  N  =  2  are  presented  in  Figure  3, 
while  further  improvements  provided  by  the  use  of  N  =  3  scales  and 
n  =  20  x  20  (PSNR  =  36.93)  compared  to  N  =  2  and  n—  12x12 
(PSNR  =  36.57)  are  shown  on  Figure  4.  One  example  of  a  multiscale 
learned  dictionary  is  presented  in  Figure  5. 


Fig.  3.  Denoising  results  for  N  =  2. 


(a)  Noisy,  cr  =  10  (b)  N  =  3  (c)  Zoom  on  (b) 


(d)  Original  (e)  N  =  2  (f)  Zoom  on  (e) 


Fig.  4.  Denoising  results  with  N  =  3  and  N  —  2. 

Our  implementation  was  coded  in  C++  using  the  Intel  Math  Ker¬ 
nel  Library.  For  N  =  1,  during  one  experiment  on  the  256  x  256 
image  “house”,  for  a  =  25,  one  Sparse  Coding  step  takes  approx¬ 
imately  3s  on  an  Opteron  2.4GHz.  With  the  same  image  and  same 
level  of  noise,  with  N  =  2,  this  time  becomes  60s.  In  both  cases, 
the  Dictionary  Update  takes  less  than  0.5s.  Thus  our  algorithm  is 
slower  than  [7],  and  improving  on  this  is  part  of  ongoing  efforts  in 
our  group.  To  achieve  this  goal  one  could  define  a  criterion  to  deac¬ 
tivate  some  scales  during  the  OMP.  Code  profiling  shows  that  more 
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5 

1.128 

1 

38.65 

37.62 

39.56 

37.31 

37.34 

37.83 

38.49 

37.91 

38.62 

37.79 

37.12 

38.16 

36.97 

36.14 

37.19 

1.069 

3 

39.37 

39.62 

39.84 

37.78 

37.94 

38.14 

38.60 

38.60 

38.70 

38.08 

37.59 

38.11 

37.22 

37.13 

37.26 

10 

1.128 

1 

35.35 

35.26 

36.37 

33.77 

34.07 

34.38 

35.61 

35.18 

35.81 

34.03 

33.79 

34.86 

33.58 

33.09 

33.76 

1.042 

3 

35.98 

36.24 

36.54 

34.28 

34.49 

34.60 

35.47 

35.63 

35.75 

34.42 

34.35 

34.57 

33.64 

33.81 

33.87 

15 

1.041 

4 

33.64 

34.08 

34.75 

31.74 

32.13 

32.35 

33.90 

33.70 

34.20 

31.86 

31.80 

33.05 

31.70 

31.44 

31.92 

1.026 

4 

34.32 

34.59 

34.87 

32.22 

32.41 

32.41 

33.70 

33.90 

34.08 

32.37 

32.47 

32.58 

31.73 

31.99 

32.02 

20 

1.023 

4 

32.39 

32.90 

33.54 

30.31 

30.59 

30.84 

32.66 

32.64 

33.02 

30.32 

30.37 

31.71 

30.38 

30.12 

30.61 

1.026 

4 

33.20 

33.45 

33.67 

30.82 

31.10 

31.11 

32.38 

32.69 

32.86 

30.83 

31.11 

31.24 

30.36 

30.69 

30.77 

25 

1.023 

4 

31.40 

32.44 

32.66 

29.21 

29.95 

29.82 

31.69 

31.66 

32.06 

29.13 

29.96 

30.68 

29.37 

29.66 

29.64 

1.020 

4 

32.15 

32.44 

32.75 

29.73 

29.95 

30.05 

31.32 

31.66 

31.89 

29.60 

29.95 

30.17 

29.28 

29.66 

29.79 

50 

1.018 

4 

28.26 

28.67 

29.68 

25.90 

25.29 

26.45 

28.61 

28.38 

29.10 

25.48 

24.09 

27.50 

26.38 

25.93 

26.63 

1.010 

5 

27.95 

28.25 

29.43 

26.13 

26.40 

26.62 

27.79 

28.11 

28.75 

25.47 

26.04 

26.80 

25.95 

26.34 

26.74 

100 

1.018 

4 

25.11 

23.08 

25.96 

22.66 

20.51 

23.06 

25.64 

23.32 

25.91 

22.61 

20.64 

24.11 

23.75 

21.78 

23.88 

1.008 

5 

23.71 

23.69 

24.73 

21.75 

22.05 

22.57 

24.46 

24.48 

25.13 

21.89 

22.04 

22.88 

22.81 

22.95 

23.65 

Table  1.  PSNR  results  of  our  denoising  algorithm.  Each  case  (image  and  noise  level )  is  divided  into  six  parts:  The  top  row  for  each  part 
presents  the  results  from,  respectively,  [5,  6,  7]  (from  left  to  right).  The  bottom  row  presents  successively  the  original  K-SVD  [2],  our  results 
for  N  —  1  (single-scale),  and  then  N  —  2  scales.  Each  time  the  best  results  is  in  bold.  The  values  of  the  parameters  C  and  m  are  reported 
in  the  second  and  third  columns:  Inside  these  ones,  the  top  part  of  each  cell  is  devoted  to  N  =  1  and  the  low  part  to  N  =  2. 


Fig.  5.  One  learned  multiscale  dictionary. 


than  85%  of  the  computational  time  is  usually  devoted  to  matrix- 
vector  multiplication  due  to  the  computation  of  scalar  products  in 
the  OMR  This  can  be  significantly  improved  using  standard  nearest- 
neighborhood  approximation  algorithms,  which  often  provide  two  or 
more  orders  of  magnitude  improvement.  In  addition,  NVIDIA  is  at 
the  moment  developing  a  parallel  linear  algebra  library  which  takes 
advantage  of  graphic  cards  and  could  potentially  provide  a  speedup 
magnitude  of  more  than  20  for  these  multiplications.  We  plan  to 
provide  a  parallel  version  of  the  algorithm  which  will  be  able  to  take 
advantage  of  the  new  multi-core  processors.  To  conclude,  we  do  not 
anticipate  the  computational  cost  of  the  algorithm  to  be  a  bottleneck 
in  the  near  future. 


5.  CONCLUSION  AND  FUTURE  DIRECTIONS 

In  this  paper  we  presented  a  K-SVD  based  algorithm  that  is  able 
to  learn  multiscale  sparse  image  representations.  Using  a  shift-inva¬ 
riant  sparsity  prior  on  natural  images,  the  proposed  framework 
achieves  state-of-the-art  denoising  results.  Our  current  efforts  are 
devoted  in  part  to  the  speed-up  of  the  algorithm  following  the  ap¬ 
proaches  mentioned  above,  and  to  the  extension  to  multiscale  sparse 
representation  of  color  images,  see  [3]  for  the  single-scale  case.  An¬ 
other  direction  we  are  pursuing  is  to  combine  the  K-SVD  with  image 
pyramids.  Results  in  these  directions  will  be  reported  soon. 
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